Visualization toolkits for enriching meta-analyses through evidence map, bibliometrics, and alternative impact metrics
Visualizing study characteristics, hidden risk of bias, societal influence, and research translation
Contributors
Yefeng Yang, Malgorzata Lagisz, Shinichi Nakagawa
Update
Last update Feb 2024
Preface
The R script provided with this html reproduces all the
figures in the main text. For the explanation of each figure, see the
main text. Version information of each package used in this
html is listed at the end. The relevant GitHub repository
can be found at
https://github.com/Yefeng0920/MA_Map_Bib.
Note:
If you wish to use the provided R code to analyze your
own data, we recommend that you carefully read the functions and
packages provided below. In other words, if you want to report this
useful data and add values in a meta-analysis, you need to spend enough
time to follow the instructions in this web page (yes, there is no such
thing as a free lunch). Basically, you need to have some experience
writing R code. If you have spent enough time and still get stuck, you
can look at the end of this page where we provide provide instructions
on how to generalize and translate the example R script to
your own data leveraging the Large Language Model (LLM).
Citation
If our paper and R script have helped you, please cite
the following paper:
Yefeng Yang, Malgorzata Lagisz, Shinichi Nakagawa. Enriching meta-analyses through scoping review, bibliometrics, and alternative impact metrics: Visualizing study characteristics, hidden risk of bias, societal influence, and research translation. arXiv, 2023.
Contact
If you have any questions, mistakes, or bug to report, please contact corresponding authors:
- Dr. Yefeng Yang
Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia
Email: yefeng.yang1@unsw.edu.au
- Professor Shinichi Nakagawa, PhD, FRSN
Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia
Email: s.nakagawa@unsw.edu.au
Set-up
Setting global options will apply those options to all of the following chunks of code in the tutorial.
Our illustrations use R statistical software and existing R packages, which you will first need to download and install.
If you do not have it on your machine, first install R
(download). We recommend also
downloading RStudio, a popular integrated development
environment for coding with R, created by a company named
posit (download).
After installing R, you must install several packages
that contain necessary functions for performing the analyses in this
tutorial. If the packages are archived in CRAN, use
install.packages() to install them. To install packages
that are not on CRAN and archived in Github repositories, execute
devtools::install_github().
The package list is as follows:
tidyverse, here, DT,
ggpubr, readxl, metafor,
lme4, car, ggplot2,
cowplot, visdat, naniar,
viridis, ggthemr, pander,
formatR, rotl, ape,
ggstance, ggtree, flextable,
bibliometrix, circlize,
igraph.
# tidy rm(list=ls()) Install and load necessary library
pacman::p_load(knitr, rmdformats, tidyverse, here, DT, ggpubr, readxl, metafor, lme4,
car, ggplot2, viridis, ggthemr, pander, formatR, rotl, cowplot, ape, ggstance,
ggtree, flextable, bibliometrix, circlize, igraph)Custom function
We provide helper functions necessary for our illustrations. You can
load these custom functions from the .R file in
Function sub-directory using the source function and
the code provided below. Alternatively, you can copy and paste the code
of these functions into your console and execute the code them to load
functions into your computer’s working memory.
# custom function
source(here("Function", "custom.R"))Enrichment 1: Evidence mapping
The first enrichment analysis is evidence mapping (sometimes termed scoping review, evidence review). It contains three types of plots: (1) grid-like plots (Figure I), (2) Sankey diagrams (Figure II), and (3) phylogenetic trees (Figure III).
Grid-like plots
We use Data 1 in the main text to illustrate how to make
the grid-like plots (Figure I). Data 1 was originally
collected by Hodkinson et al. (1), who conducted a network
meta-analysis to assess the efficacy of different self-management
interventions (multidisciplinary case management, regularly supported
self-management, and minimally supported self-management) in enhancing
the quality of life among asthma patients. Note that the following
R code was adapted from (2).
Data 1
# load data
dat <- read_xlsx(here("Data", "Hodkinson_2020.xlsx"))
# preprocess
dat$vi <- dat$`Std Err2`^2
dat <- dat[, c("Intervention model", "Outcome", "Hedgesg", "vi", "Study name")]
names(dat) <- c("dimension1", "dimension2", "yi", "vi", "study_id")
# show data in a datatable
t1 <- dat %>%
dfround(3) %>%
DT::datatable()
t1Visualization
Figure I shown in the main text can be made with:
# get estimate for each cell
est_dat <- dat %>% group_by(dimension1, dimension2) %>%
group_modify(~ meta_aggregate(.x, rho = 0.5)) %>% ungroup()
# traditional map with the number of study
est_dat$dimension1 <- as.factor(est_dat$dimension1)
Box1_map1 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = n_studies) ) +
geom_point(alpha = 0.5, color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[1]) +
labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)") +
scale_size(range=c(5,10)) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + #
scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme_bw() +
guides(size = "none") +
theme(legend.position='top',
legend.justification='right',
legend.direction='horizontal',
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black")) +
geom_text(aes(label = as.character(n_studies)), size = 4, color = "gray10") +
labs(caption = "The value in the cell is the number of studies") +
theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12, face = "bold"))
# traditional map with the number of effect size
Box1_map2 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = n_es)) +
geom_point(alpha = 0.5, color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3]) +
labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)") +
scale_size(range=c(5,10)) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme_bw() +
guides(size = "none") +
theme(legend.position='top',
legend.justification='right',
legend.direction='horizontal',
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black")) +
geom_text(aes(label = as.character(n_es)), size = 4, color = "gray10") +
labs(caption = "The value in the cell is the number of effect sizes") +
theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
axis.text = element_text(size = 12),
axis.title = element_text(face = "bold", size = 12))
# with effect size information
Box1_map3 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = n_es, color = estimate)) +
geom_point(alpha = 0.6) +
scale_color_gradient(
low = "blue",
high = "red",
limits = c(-1,1),
guide = "colourbar") +
labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)", color = "Meta-analytic mean effect size") +
scale_size(range=c(5,10)) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme_bw() +
guides(size = "none") +
theme(legend.position='top',
legend.justification='right',
legend.direction='horizontal',
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black")) +
geom_text(aes(label = as.character(n_es)), size = 4, color = "gray10") +
labs(caption = "The value in the cell is the number of effect sizes") +
theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
axis.text = element_text(size = 12),
axis.title = element_text(face = "bold", size = 12))
# 3-dimensional graph - we do not recommend to 3 dimensional graph because this is not that friendly to end-user
dat <- read_xlsx(here("Data","Hodkinson_2020.xlsx"))
dat$vi <- dat$`Std Err2`^2
dat <- dat[, c("Intervention model", "Outcome", "Age group", "Hedgesg", "vi", "Study name")]
names(dat) <- c("dimension1", "dimension2", "dimension3", "yi", "vi", "study_id")
# get estimate each cell
est_dat <- dat %>% group_by(dimension1, dimension2, dimension3) %>%
group_modify(~ meta_aggregate(.x, rho = 0.5)) %>% ungroup()
# with the third dimension
Box1_map4 <- ggplot(est_dat, aes(x = dimension1, y = dimension2,
shape = dimension3,
size = n_es
)) +
geom_point(alpha = 0.6, color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[5], aes(group = dimension3), position = position_dodge(width= 0.5)) +
scale_color_gradient(
low = "blue",
high = "red",
limits = c(-1,1),
guide = "colourbar") +
scale_shape_manual(values = c(15, 19, 17, 18, 10, 4, 3, 7, 8, 13)) +
labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)", shape = "Diamension 3 (Population)") +
scale_size(range=c(5,10)) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme_bw() +
guides(size = "none") +
theme(legend.position='top',
legend.justification='right',
legend.direction='horizontal',
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black")) +
geom_text(aes(label = as.character(n_es), group = dimension3), size = 4, color = "gray10", position = position_dodge(width = .5)) +
labs(caption = "The value in the cell is the number of effect sizes") +
theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
axis.text = element_text(size = 12),
axis.title = element_text(face = "bold", size = 12))
#Box1_map1 + Box1_map2 + Box1_map3 + Box1_map4 + patchwork::plot_layout(ncol = 2, nrow = 2) + patchwork::plot_annotation(tag_levels = "A")
#png(filename = "Box1_map.png", width = 10, height = 10, units = "in", type = "windows", res = 400)
plot_grid(Box1_map1, Box1_map2, Box1_map3, Box1_map4, labels = c('A','B','C','D'), label_size = 14, nrow = 2, ncol = 2)Alluvial diagrams
We used Data 2 in the main text to illustrate how to
make the Sankey diagrams (Figure II). Data 2 was originally
collected by Mertens et al. (3),
who employed a multilevel meta-analytic model to synthesize evidence on
the effectiveness of choice architecture interventions (often referred
to as nudges) for behaviour change across various techniques,
behavioural domains, and other study characteristics (e.g., populations
and locations).
Data 2
# load data
dat <- read.csv(here("Data", "Mertens_2021.csv"))
names(dat)[1] <- "publication_id"
# show data in a data
t2 <- dat %>%
dfround(3) %>%
DT::datatable()
t2Visualization
Figure II can be made with:
# pre-process
dat <- dat %>%
dplyr::select(type_experiment, intervention_technique, intervention_category,
domain, population, location)
# png(filename = './Box1_sankey.png', width = 6, height = 5, units = 'in', type
# = 'windows', res = 400)
ggplot(dlong(dat, type_experiment, location, population, intervention_category, domain,
intervention_technique), aes(x = x, next_x = next_x, node = node, next_node = next_node,
fill = factor(node), label = node)) + sankey_p(flow.alpha = 0.8, node.color = "transparent") +
sankey_p_label(size = 3, color = "white", fill = "gray10", alpha = 0.6) + ggsci::scale_color_tron() +
theme_sankey(base_size = 10) + labs(x = NULL) + theme(legend.position = "none",
plot.title = element_text(hjust = 0.5), axis.text.x = element_text(color = "black",
size = 9)) + scale_x_discrete(labels = c("Moderator 1 \n (Experimental \n approach)",
"Moderator 2 \n (Geographical \n location)", "Moderator 3 \n (Population \n characteristic)",
"Moderator 4 \n (Architecture \n category)", "Moderator 5 \n (Behavioral \n domain)",
"Moderator 6 \n (Intervention \n technique)"), position = "top")Phylogenetic trees
We used Data 3 in the main text to illustrate how to
make the phylogenetic trees (Figure III). Data 3 was
originally collected by Sanders et al. (4) , who used a Bayesian meta-analytic
model to synthesize evidence regarding the impacts of artificial light
at night on physiological, phenological, life history, activity
patterns, and population/community-based outcomes. This meta-analysis
included more than 180 species. We only used the physiological
outcomes.
Data 3
# load data
dat <- read_xlsx(here("Data", "Sanders_2021.xlsx"))
# show data in a data
t3 <- dat %>%
dfround(3) %>%
DT::datatable()
t3Visualization
Figure III can be made with:
# some data wrangling
dat$Species <- gsub("\\.", "_", dat$Species)
dat <- dat[!is.na(dat$Species), ] #
# check species we have length(unique(dat$Species)) #184 unique species names,
# if no misspelling
# find Open Tree Taxonomy (OTT) IDs for each species
taxa <- tnrs_match_names(names = unique(dat$Species))
# for illustrative purpose, we just delete the species that are not matched
# find location
pos <- which(dat$Species %in% c("Phodopus_sungeroru", "Electra?__sp", "Molgula_sp",
"Baetis_spp", "Agrostis_tenuis", "Anthoxanthum_odoratum", "Myotis_daubentonii",
"_Myotis_mystacinus", "_Myotis_brandtii", "_Myotis_nattereri", "_Plecotus_auritus",
"Pipistrellus_pipistrelles", "_Pipistrellus_nathusii", "Nyctalus_nyctalus", "_Nyctalus_leisleri",
"_Eptesicus_serotinus_", "All_families", "Pipistrellus_hesperidus/Hypsugo_anchietaia",
"Myotis?_sp", "MCF7", "7288cct", "A__cristatellus_", "A__evermanni", "A__gundlachi",
"A__sagrei", "bat_community_", "N__leisleri", "Poecile_sp", "Coleoptera", "_Diptera",
"_Lepidoptera", "_Erebidae", "_Chironomidae", "_Noctuidae_and_Psychodidae", "Nyctalus_and_Eptesicus_spp",
"Anoectochilus_roxburghii", "Myotis_pilosatibialis", "Silene_latifolia"))
dat <- dat[-pos, ]
# match again
taxa <- tnrs_match_names(names = unique(dat$Species)) # still some are not matched. But let's resolve it latter
# check whether occur in the synthetic tree
ott_in_tree <- ott_id(taxa)[is_in_tree(ott_id(taxa))]
# length(ott_id(taxa)) - length(is.na(ott_in_tree)) # still 10 did not appear
# in synthetic tree
# check which 10
out_tree <- filter(taxa, !ott_id %in% ott_in_tree)
# out_tree$search_string
dat <- dat[!dat$Species %in% c("Leptocythere_pellucida", "Myotis_daubentonii,_Myotis_mystacinus,_Myotis_brandtii,_Myotis_nattereri,_Plecotus_auritus",
"Pipistrellus_pipistrelles,_Pipistrellus_nathusii", "Nyctalus_nyctalus,_Nyctalus_leisleri,_Eptesicus_serotinus_",
"Myotis_lucifugus", "Acyrthosiphont", "Carabidae", "Staphylinidae", "Eptesicus_bottae",
"Corethrella_spp", "P__pipistrellus", "Myotis", "Coleoptera,_Diptera,_Lepidoptera,_Erebidae,_Chironomidae,_Noctuidae_and_Psychodidae",
"Pipistrellus"), ]
# match again
taxa <- tnrs_match_names(names = unique(dat$Species))
# check again whether all otts occur in the synthetic tree
ott_in_tree <- ott_id(taxa)[is_in_tree(ott_id(taxa))]
# length(ott_id(taxa)) - length(is.na(ott_in_tree)) # all good
# now every ott occur in the synthesistic tree. But for the sake of brevity, we
# only visualize a subset of the tree only use studies measuring physiology
Physiology <- subset(dat, Category == "Physiology")
# remove NA
Physiology <- subset(Physiology, Species != "NA")
taxa <- tnrs_match_names(names = unique(Physiology$Species))
# make phylo tree
tree <- suppressWarnings(tol_induced_subtree(ott_ids = ott_id(taxa)))# the tip labels contain OTTs, which means they will not perfectly match the
# species names in our dataset or the taxon map that we created earlier, remove
# the extra information from the tip labels later; with the IDs removed, we can
# use our taxon map to replace the tip labels in the tree with the species
# names from dataset
tree$tip.label <- strip_ott_ids(tree$tip.label, remove_underscores = TRUE)
# decapitalise species names to match with the search string names in taxa
Physiology <- Physiology %>%
mutate(search_string = tolower(Species))
# align data
Physiology <- left_join(Physiology, dplyr::select(taxa, search_string, unique_name,
ott_id), by = "search_string")
# create the variables of spp and phylo
Physiology <- Physiology %>%
mutate(spp = unique_name, phylo = unique_name)
# prepare annotation data species-specific estimates as fixed effect
agg.es <- escalc(measure = "SMD", m1i = Experimental_Mean, m2i = Control_Mean, sd1i = Experimental_SD,
sd2i = Control_SD, n1i = Experimental_N, n2i = Control_N, data = Physiology) %>%
aggregate(cluster = phylo, struct = "CS", rho = 0.5, addk = TRUE)
agg.es <- agg.es[c("phylo", "yi", "vi")]
names(agg.es) <- c("Species", "Mean", "SE")
agg.es <- agg.es %>%
mutate(Lower_bound = Mean - sqrt(SE) * qnorm(0.975), Upper_bound = Mean + sqrt(SE) *
qnorm(0.975)) %>%
arrange(Species)
N_obs <- Physiology %>%
group_by(phylo) %>%
summarise(N_obs = n()) # calculate sample size
names(N_obs)[1] <- "Species"
fe.spp.es <- left_join(agg.es, N_obs, by = "Species")
tip.label <- data.frame(Species = tree$tip.label) # extract tip label
fe.spp.es2 <- left_join(tip.label, fe.spp.es, by = "Species")
fe.spp.es2 <- fe.spp.es2 %>%
mutate(z = Mean/SE, p = pnorm(abs(z), lower.tail = F) * 2)
# get class data
spp.class <- dplyr::distinct(Physiology, phylo, .keep_all = TRUE) %>%
dplyr::select(Class, phylo)
names(spp.class)[2] <- "Species"
spp.class2 <- left_join(tip.label, spp.class, by = "Species")
# make each panel
tree.p1 <- ggtree(tree, layout = "rectangular", cex = 0.4)
tree.p2 <- tree.p1 %<+% spp.class2 + geom_tiplab(aes(color = Class), size = 3, fontface = "italic",
align = T, offset = 0.3) + geom_tippoint(aes(color = Class)) + xlim_expand(c(0,
21), panel = "Tree") + guides(color = "none") + scale_color_viridis_d()
tree.p3 <- tree.p2 + geom_facet(panel = "Effect size", data = fe.spp.es2, geom = ggstance::geom_pointrangeh,
mapping = aes(x = Mean, xmin = Lower_bound, xmax = Upper_bound, color = Class)) +
theme_tree2() + theme(strip.background = element_rect(fill = "white"))
# png(filename = 'Box1_phylo.png', width = 6, height = 5, units = 'in', type =
# 'windows', res = 400)
facet_widths(tree.p3, c(Tree = 0.4, `Effect size` = 0.2))Enrichment 2: Bibliometric analysis
The second enrichment analysis is bibliometric anlaysis. It can produce two main graphs: (1) Co-authorship network (Figure IV), and (2) Country network (Figure V).
Country network
The same bibliometric data can be to illustrate the construction of country network (Figure V) in meta-analytic evidence base.
Data 3
# country
M_country <- metaTagExtraction(M, Field = "AU_CO", sep = ";")
NetMatrix <- biblioNetwork(M_country, analysis = "coupling", network = "countries",
sep = ";")
net_matrix <- as.matrix(NetMatrix)
# get rid of collaboration with same country
diag(net_matrix) <- 0
# getting rid of lower triangle
net_matrix[lower.tri(net_matrix)] <- 0
# colnames(net_matrix) - change to title case:
colnames(net_matrix) <- str_to_title(colnames(net_matrix))
# rownames(net_matrix) - change to title case:
rownames(net_matrix) <- str_to_title(rownames(net_matrix))
# Fix 'Usa' to 'United States' :
colnames(net_matrix)[colnames(net_matrix) == "Usa"] <- "United States"
rownames(net_matrix)[rownames(net_matrix) == "Usa"] <- "United States"Visualization
Figure V can be made with:
# color palette
color <- viridis::viridis(34, alpha = 1, option = "D")
color <- color[sample(1:34)]
circos.clear()
circos.par(start.degree = 90,
gap.degree = 3,
points.overflow.warning = FALSE)
#png(filename = "Box2_country.png", width = 5, height = 5, units = "in", type = "windows", res = 400)
chordDiagram(net_matrix,
grid.col = color, transparency = 0.1,
directional = 1,direction.type = c("arrows", "diffHeight"), diffHeight = -0.04, annotationTrack = "grid", annotationTrackHeight = c(0.05, 0.1),link.arr.type = "big.arrow",link.sort = TRUE, link.largest.ontop = TRUE,
preAllocateTracks = 1)
# add text and axis
circos.trackPlotRegion(track.index = 1, panel.fun = function(x, y) {
xlim = get.cell.meta.data("xlim")
ylim = get.cell.meta.data("ylim")
sector.index = get.cell.meta.data("sector.index")
# add names to the sector.
circos.text(x = mean(xlim), y = ylim[1] + .01,
labels = sector.index, facing = "clockwise", # clockwise
niceFacing = TRUE, cex = 0.6, adj = c(0, 0.5))
# add graduation on axis
circos.axis(h = "top", labels.cex = 0.01, major.tick.length = 0.01, sector.index = sector.index, track.index = 2, labels.niceFacing = T)
}, bg.border = NA)Enrichment 3: Altmetric analyses
We used Data 4 in the main text to illustrate how to
visualize the alternative impact metrics (Figure VI).
Data 4 was originally used to examine the replicability of
the preclinical cancer biology (6).
Data 4
# load meta-analytic data
dat <- suppressMessages(read_csv(here("Data", "Timothy_2021.csv")))
# use a custom function to extract data using API because it takes a while to
# extract data using API, we just upload the pre-extracted data.
altmetrics <- read.csv(here("Data", "altmetrics.csv"))
#-------------------not run-------------------#
# altmetric.crawler <- list(NULL) for (n in 1:length(dat$DOI)) { # JASON format
# altmetric.crawler[[n]] <- try(list(format.Altmetric(getAltmetrics(doi =
# dat$DOI[n]))),silent=TRUE) }
# get lists within lists altmetric.crawler2 <- sapply(altmetric.crawler,
# function(x) {x}) retrieve stats altmetrics <-
# altmetric_summary(altmetric.crawler2) save(data, file = 'data.Rdata')
# write.csv(altmetrics,file = 'Data/altmetrics.csv')
#-------------------not run-------------------#
# show data in a datatable
t4 <- altmetrics %>%
DT::datatable()
t4Visualization
Figure VI can be made with:
altmetrics2 <- altmetrics %>%
distinct(paper, .keep_all = TRUE)
altmetrics2 <- altmetrics2 %>%
mutate(count = Policy + Patent)
altmetrics2 <- altmetrics2 %>%
mutate(group = rep("", nrow(altmetrics2)))
# png(filename = 'Box3_altmetric1.png', width = 5, height = 3, units = 'in',
# type = 'windows', res = 400)
Box3_altmetric1 <- ggplot2::ggplot() + ggbeeswarm::geom_quasirandom(data = altmetrics2,
ggplot2::aes(y = Altmetric.score, x = group, size = count), fill = "#1B9E77",
col = "#999999", alpha = 0.8, shape = 21) + ylim(0, 300) + ggplot2::coord_flip() +
ggplot2::theme_bw() + ggplot2::guides(fill = "none", colour = "none") + ggplot2::theme(legend.position = c(0,
1), legend.justification = c(0, 1)) + ggplot2::theme(legend.title = ggplot2::element_text(size = 9)) +
ggplot2::theme(legend.direction = "horizontal") + ggplot2::theme(legend.background = ggplot2::element_blank()) +
ggplot2::labs(x = "Social media interest", y = "Altmetric score", size = latex2exp::TeX("Policy and patent citations")) +
# ggplot2::theme(axis.ticks.y = element_blank()) +
ggplot2::theme(axis.text.y = ggplot2::element_text(size = 10, colour = "black", hjust = 0.5,
angle = 0))
# add altmetrics and transnational info to an evidence map load meta-analytic
# data
dat <- suppressMessages(read_csv(here("Data", "Timothy_2021.csv")))
altmetrics <- read.csv(here("Data", "altmetrics.csv"))
# add altmetrics to the original data
dat <- dat %>%
mutate(Altmetric.score = altmetrics$Altmetric.score, policy = altmetrics$Policy,
patent = altmetrics$Patent)
dat <- dat[, c("Original paper journal", "Replication study fully completed", "Altmetric.score",
"Paper #")]
names(dat) <- c("dimension1", "dimension2", "score", "study_id")
dat$score <- round(dat$score, 0)
# delete the cases where the replications have not be conducted due to the lack
# of experimental protocol details
dat <- dat[!dat$dimension2 == "Not applicable", ]
# run altmetric_aggregate function through each combination
est_dat <- dat %>%
group_by(dimension1, dimension2) %>%
group_modify(~altmetric_aggregate(.x)) %>%
ungroup()
est_dat1 <- est_dat
# policy and patent counts
dat <- suppressMessages(read_csv(here("Data", "Timothy_2021.csv")))
# add altemtrics to the original data
dat <- dat %>%
mutate(Altmetric.score = altmetrics$Altmetric.score, policy = altmetrics$Policy,
patent = altmetrics$Patent)
dat <- dat %>%
mutate(count = patent + policy)
dat <- dat[, c("Original paper journal", "Replication study fully completed", "count",
"Paper #")]
names(dat) <- c("dimension1", "dimension2", "count", "study_id")
dat$count <- round(dat$count, 0)
# delete the cases where the replications have not be conducted due to the lack
# of experimental protocol details
dat <- dat[!dat$dimension2 == "Not applicable", ]
# run translation_aggregate function through each cell
est_dat <- dat %>%
group_by(dimension1, dimension2) %>%
group_modify(~translation_aggregate(.x)) %>%
ungroup()
est_dat2 <- est_dat
colnames(est_dat2)[colnames(est_dat2) == "estimate"] <- "count"
est_dat2 <- est_dat2[, 1:3]
# combine
est_dat <- merge(est_dat1, est_dat2, by = c("dimension1", "dimension2"))
est_dat$count <- round(est_dat$count, 1)
est_dat <- est_dat %>%
mutate(dimension2 = case_when(dimension2 == "No replication" ~ "Incomplete replication",
dimension2 == "Full replication" ~ "Full replication", dimension2 == "Partial replication" ~
"Partial replication"))
# png(filename = 'Box3_altmetric2.png', width = 6, height = 6, units = 'in',
# type = 'windows', res = 400)
Box3_altmetric2 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = count,
color = estimate)) + geom_point(alpha = 0.6) + scale_color_gradient(low = "#E6AB02",
high = "purple", limits = c(0, 400), guide = "colourbar") + labs(x = "Dimension 1 (Journal)",
y = " Dimension 2 (Replication completion)", color = "Altmetric score") + scale_size_identity() +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + scale_y_discrete(labels = function(x) str_wrap(x,
width = 10)) + theme_bw() + guides(size = "none") + theme(legend.position = "top",
legend.justification = "right", legend.direction = "horizontal", axis.text = element_text(color = "black"),
axis.title = element_text(color = "black")) + geom_text(aes(label = as.character(count)),
size = 2.5, color = "gray10", fontface = "bold") + labs(caption = "The value in the cell is the citation count of policies and patents") +
theme(plot.caption = element_text(size = 8, color = "gray10", face = "italic"))
# png(filename = 'Box3_altmetric.png', width = 5, height = 7, units = 'in',
# type = 'windows', res = 400)
plot_grid(Box3_altmetric1, Box3_altmetric2, labels = c("A", "B"), label_size = 14,
nrow = 2, ncol = 1, rel_heights = c(1, 2))Leveraing the Large Language Model to make your own figures
For readers who are not able to generalize and translate the example
R script to their own data, we provide instructions on how
to use a Large Language Model (LLM) to generalize and translate the
example R script to your own data.
Specifically, we will use the exercise data,
(Data_exercise1) (7)
and (Data_exercise2)(8) for illustration, and show how to
use three LLMs (Open AI ChatGPT, Google
Gemini, and Microsoft Copilot) to draw the
main enrichment figures shown above.
ChatGPT
Grid-like plots
First, you need to upload your own data (in our case,
Data_exercise1). For simplicity, we copy the data
Data_exercise1 as dat.
# load data
dat <- read_csv(here("Data", "Data_exercise1.csv"))
# only select necessary variables
dat <- select(dat, id, Study, Flight, Bird.Type, Sexes, Ages, Type, Tag.Type)Next, let’s have a look at the variables in dat:
# show data
dat %>%
dfround(3)## # A tibble: 451 x 8
## id Study Flight Bird.~1 Sexes Ages Type Tag.T~2
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 Ackerman Wildlife Soc Bull 2~ flap seabird both ad Surv~ VHF
## 2 2 Anich J Field Ornith 2009Surv~ flap passer~ both ad Surv~ VHF
## 3 3 Arbeiter Vogel Hidd 2012Survi~ flap passer~ both ad Surv~ GLS
## 4 4 Arlt PLoS one 2013Survival flap passer~ both ad Surv~ GLS
## 5 5 Bachler Condor 2007Survival flap passer~ both ad Surv~ VHF
## 6 6 Bachler PLoS one 2010Survival flap passer~ both ad Surv~ GLS
## 7 7 Bairlein Biol Letts 2012Survi~ flap passer~ both ad Surv~ GLS
## 8 8 Blackburn J Av Biol 2016Survi~ flap passer~ both ad Surv~ GLS
## 9 9 Boag J Wildl Manag 1973Surviv~ flap gamebi~ both ad Surv~ VHF
## 10 10 Brown Unpub 2011Survival flap passer~ both ad Surv~ GLS
## # ... with 441 more rows, and abbreviated variable names 1: Bird.Type,
## # 2: Tag.Type
Assume that you are clear about the meaning of each variable
(meta-data). Assume you want to make a Grid-like plot
to show the overlap between the variable Bird.Type and
Type. To be more precise, you want to have a evidence gap
map whereBird.Type (dimension1) is placed on the x-axis,
and Type (dimension2) is placed on the y-axis, with the
number of number of effects (sample size) displayed at the intersection
of the x- and y-axes.
Then, let’s try to have conversation with ChatGPT 3.5 as follows:
Assume you are an expert in visualization in R. I have example R code that can be used to make a nice figure.
The R code is as follows:
est_dat <- dat %>%
group_by(dimension1, dimension2) %>%
summarize(n_es = n()) # get the number of effect sizes for each combination of the two variables
ggplot(est_dat, aes(x = dimension1, y = dimension2, size = n_es)) +
geom_point(alpha = 0.5, color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3]) +
labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)") +
scale_size(range=c(5,10)) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme_bw() +
guides(size = "none") +
theme(legend.position='top',
legend.justification='right',
legend.direction='horizontal',
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black")) +
geom_text(aes(label = as.character(n_es)), size = 4, color = "gray10") +
labs(caption = "The value in the cell is the number of effect sizes") +
theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
axis.text = element_text(size = 12),
axis.title = element_text(face = "bold", size = 12))
Now, I want to apply this example R code to my own data named `dat`. The `dat` has two variables named `Bird.Type` and `Type`. Can you use the example R code with my own data `dat` by replacing the variable names dimension1 and dimension2 with Bird.Type and Type, respectively?
The ChatGPT 3.5 will return you the adapted R code:
# R code from ChatGPT 3.5 Load required libraries
library(dplyr)
library(ggplot2)
library(stringr)
# Get the number of effect sizes for each combination of Bird.Type and Type
est_dat <- dat %>%
group_by(Bird.Type, Type) %>%
summarize(n_es = n())
# Create the plot
ggplot(est_dat, aes(x = Bird.Type, y = Type, size = n_es)) + geom_point(alpha = 0.5,
color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3]) + labs(x = "Bird Type",
y = "Type") + scale_size(range = c(5, 10)) + scale_x_discrete(labels = function(x) str_wrap(x,
width = 10)) + scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme_bw() + guides(size = "none") + theme(legend.position = "top", legend.justification = "right",
legend.direction = "horizontal", axis.text = element_text(color = "black"), axis.title = element_text(color = "black")) +
geom_text(aes(label = as.character(n_es)), size = 4, color = "gray10") + labs(caption = "The value in the cell is the number of effect sizes") +
theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
axis.text = element_text(size = 12), axis.title = element_text(face = "bold",
size = 12))Nice! You get the grid-like plot where you can identify knowledge gaps by looking at the size of the bubble, thus setting the agenda for future research to fill such gaps. Certainly, you can make changes to the code and improve the figure according to your own preference, including adjusting the colors, labels, and other plot properties as needed to customize the plot according to your preferences.
Alluvial diagrams
With the same data (dat), we next teach you how to use
ChatGPT 3.5 to make an alluvial diagram.
Starts your dialogue with ChatGPT 3.5 as:
Assume you are an expert in visualization in R. I have example R code that can be used to make a nice figure.
The R code is as follows:
ggplot(dlong(dat, type_experiment, location, population, intervention_category, domain, intervention_technique), aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
sankey_p(flow.alpha = 0.8,
node.color = "transparent") +
sankey_p_label(size = 3, color = "white", fill = "gray10", alpha = 0.6) +
ggsci::scale_color_tron() +
theme_sankey(base_size = 10) +
labs(x = NULL) +
theme(legend.position = "none",
plot.title = element_text(hjust = .5),
axis.text.x = element_text(color = "black", size = 9)) +
scale_x_discrete(labels = c("Moderator 1 \n (Experimental \n approach)", "Moderator 2 \n (Geographical \n location)", "Moderator 3 \n (Population \n characteristic)", "Moderator 4 \n (Architecture \n category)", "Moderator 5 \n (Behavioral \n domain)", "Moderator 6 \n (Intervention \n technique)"), position = "top")
Now, I want to apply this example R code to my own data named `dat`. The `dat` has two variables named `Flight`, `Sexes`, `Ages`, `Tag.Type`. Can you use the example R code with my own data `dat` by replacing the variables shown in the example code with the variables in my our data? Note that the function `dlong()`, `sankey_p()`, and `sankey_p_label()` are custom functions.
# code returned by ChatGPT 3.5
library(ggplot2)
# Replace the variables in aes() with your data variables
ggplot(dlong(dat, Flight, Sexes, Ages, Tag.Type), aes(x = x, next_x = next_x, node = node,
next_node = next_node, fill = factor(node), label = node)) + sankey_p(flow.alpha = 0.8,
node.color = "transparent") + sankey_p_label(size = 3, color = "white", fill = "gray10",
alpha = 0.6) + ggsci::scale_color_tron() + theme_sankey(base_size = 10) + labs(x = NULL) +
theme(legend.position = "none", plot.title = element_text(hjust = 0.5), axis.text.x = element_text(color = "black",
size = 9)) + scale_x_discrete(labels = c("Flight", "Sexes", "Ages", "Tag Type"),
position = "top")Nice! You get your alluvial plot. You can see the flow or overlaps in the level of important moderators or predictors, summarizing their connections and co-linearity and missing data patterns in an accessible manner. Again, you may need to adjust other plot parameters and styling options according to your preferences and requirements.
Altmetric analyses
We will use the alternative metric data associated with the
altmetrics_exercise(8) to illustrate how to conduct the
altmetric analysis by “asking” ChatGPT 3.5 to adapt the provided example
R code. The alternative data of the
altmetrics_exercise(8) were collected from Altmetrics
online service (www.altemetric.com). You can use the custom function
provided above to automatically extract Altmetric score, policy, and
patent citation for each study included in the meta-analytic evidence
(8). Alternatively, you can
extract these statistics manually from Altmetrics online service
(www.altemetric.com). Anyway, assume you have the
altmetrics_exercise, which contains Altmetric score,
policy, and patent citation.
# load altmetric data
altmetrics_exercise <- read.csv(here("Data", "altmetrics_exercise.csv"))
# remove NA
altmetrics_exercise <- na.omit(altmetrics_exercise)Ask ChatGPT 3.5:
Assume you are an expert in visualization in R. I have example R code that can be used to make a nice figure.
The R code is as follows:
altmetrics2 <- altmetrics %>% distinct(paper, .keep_all = TRUE)
altmetrics2 <- altmetrics2 %>% mutate(count = Policy + Patent)
altmetrics2 <- altmetrics2 %>% mutate(group = rep("",nrow(altmetrics2)))
ggplot2::ggplot() + ggbeeswarm::geom_quasirandom(data = altmetrics2, ggplot2::aes(y = Altmetric.score, x = group, size = count), fill = "#1B9E77", col="#999999", alpha=0.8, shape = 21) +
ggplot2::coord_flip() +
ggplot2::theme_bw() +
ggplot2::guides(fill = "none", colour = "none") +
ggplot2::theme(legend.position= c(0, 1), legend.justification = c(0, 1)) +
ggplot2::theme(legend.title = ggplot2::element_text(size = 9)) +
ggplot2::theme(legend.direction="horizontal") +
ggplot2::theme(legend.background = ggplot2::element_blank()) +
ggplot2::labs(x = "Social media interest", y = "Altmetric score", size = "Policy and patent citations") +
ggplot2::theme(axis.text.y = ggplot2::element_text(size = 10, colour ="black",hjust = 0.5, angle = 0))
Now, I want to apply this example R code to my own data named `altmetrics_exercise`. Can you replace the data `altmetrics` by own data `altmetrics_exercise`?
# Load packages
library(ggplot2)
library(ggbeeswarm)
# Assuming altmetrics_exercise is your own data frame
altmetrics2 <- altmetrics_exercise %>%
distinct(paper, .keep_all = TRUE)
altmetrics2 <- altmetrics2 %>%
mutate(count = Policy + Patent)
altmetrics2 <- altmetrics2 %>%
mutate(group = rep("", nrow(altmetrics2)))
ggplot() + geom_quasirandom(data = altmetrics2, aes(y = Altmetric.score, x = group,
size = count), fill = "#1B9E77", col = "#999999", alpha = 0.8, shape = 21) +
coord_flip() + theme_bw() + guides(fill = "none", colour = "none") + theme(legend.position = c(0,
1), legend.justification = c(0, 1)) + theme(legend.title = element_text(size = 9)) +
theme(legend.direction = "horizontal") + theme(legend.background = element_blank()) +
labs(x = "Social media interest", y = "Altmetric score", size = "Policy and patent citations") +
theme(axis.text.y = element_text(size = 10, colour = "black", hjust = 0.5, angle = 0))Awesome! You get the altmetric plot, showing the overall social media attention received by studies within a meta-analysis and the extent to which these studies are translated into practical applications. Of course, you can adjust the aesthetics and theme settings as needed to customize the plot appearance according to your preferences.
Gemini
Please help add
Grid-like plots
Alluvial diagrams
Altmetric analyses
Copilot
Please help add
Grid-like plots
Alluvial diagrams
Altmetric analyses
License
This documented is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International.
Software and package versions
sessionInfo() %>%
pander()R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: _LC_COLLATE=Chinese (Simplified)China.936, _LC_CTYPE=Chinese (Simplified)China.936, _LC_MONETARY=Chinese (Simplified)China.936, LC_NUMERIC=C and _LC_TIME=Chinese (Simplified)China.936
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ggbeeswarm(v.0.6.0), igraph(v.1.3.0), circlize(v.0.4.15), bibliometrix(v.4.0.2), flextable(v.0.8.6), ggtree(v.3.7.1.001), ggstance(v.0.3.5), ape(v.5.6-1), cowplot(v.1.1.1), rotl(v.3.0.11), formatR(v.1.11), pander(v.0.6.4), ggthemr(v.1.1.0), viridis(v.0.6.2), viridisLite(v.0.4.0), car(v.3.0-11), carData(v.3.0-4), lme4(v.1.1-26), metafor(v.4.2-0), numDeriv(v.2016.8-1.1), metadat(v.1.2-0), Matrix(v.1.5-3), readxl(v.1.3.1), ggpubr(v.0.4.0), DT(v.0.19), here(v.1.0.1), forcats(v.0.5.2), stringr(v.1.5.0), dplyr(v.1.0.10), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.1), tibble(v.3.1.8), ggplot2(v.3.4.4), tidyverse(v.1.3.1), rmdformats(v.1.0.3) and knitr(v.1.37)
loaded via a namespace (and not attached): pacman(v.0.5.1), utf8(v.1.2.2), tidyselect(v.1.2.0), htmlwidgets(v.1.5.3), FactoMineR(v.2.6), grid(v.4.0.3), clubSandwich(v.0.5.3), munsell(v.0.5.0), codetools(v.0.2-18), ragg(v.1.2.5), statmod(v.1.4.35), rentrez(v.1.2.3), withr(v.2.5.0), colorspace(v.2.0-0), highr(v.0.9), uuid(v.0.1-4), rstudioapi(v.0.13), leaps(v.3.1), ggsignif(v.0.6.3), officer(v.0.6.0), fontLiberation(v.0.1.0), labeling(v.0.4.2), emmeans(v.1.6.3), bit64(v.4.0.5), farver(v.2.1.0), rprojroot(v.2.0.2), coda(v.0.19-4), vctrs(v.0.5.0), treeio(v.1.14.4), generics(v.0.1.0), TH.data(v.1.1-0), xfun(v.0.29), fontquiver(v.0.2.1), R6(v.2.5.1), cachem(v.1.0.6), gridGraphics(v.0.5-1), assertthat(v.0.2.1), vroom(v.1.5.7), promises(v.1.2.0.1), scales(v.1.2.1), multcomp(v.1.4-17), beeswarm(v.0.4.0), gtable(v.0.3.0), multcompView(v.0.1-8), sandwich(v.3.0-1), rlang(v.1.1.1), systemfonts(v.1.0.4), scatterplot3d(v.0.3-41), dimensionsR(v.0.0.3), GlobalOptions(v.0.1.2), splines(v.4.0.3), rstatix(v.0.7.0), lazyeval(v.0.2.2), broom(v.1.0.1), yaml(v.2.2.1), abind(v.1.4-5), modelr(v.0.1.8), tidytext(v.0.3.4), crosstalk(v.1.1.1), backports(v.1.2.1), httpuv(v.1.6.2), tokenizers(v.0.2.1), tools(v.4.0.3), bookdown(v.0.24), pubmedR(v.0.0.3), ggplotify(v.0.0.9), ellipsis(v.0.3.2), jquerylib(v.0.1.4), RColorBrewer(v.1.1-3), latex2exp(v.0.9.4), plyr(v.1.8.6), Rcpp(v.1.0.8.3), rscopus(v.0.6.6), progress(v.1.2.2), prettyunits(v.1.1.1), openssl(v.1.4.4), zoo(v.1.8-9), haven(v.2.4.3), ggrepel(v.0.9.1), cluster(v.2.1.0), fs(v.1.5.2), factoextra(v.1.0.7), crul(v.1.3), magrittr(v.2.0.3), data.table(v.1.14.0), openxlsx(v.4.2.4), reprex(v.2.0.1), mvtnorm(v.1.1-3), SnowballC(v.0.7.0), bibliometrixData(v.0.3.0), hms(v.1.1.0), patchwork(v.1.1.1), mime(v.0.11), evaluate(v.0.14), xtable(v.1.8-4), XML(v.3.99-0.7), rio(v.0.5.29), shape(v.1.4.6), gridExtra(v.2.3), compiler(v.4.0.3), fontBitstreamVera(v.0.1.1), crayon(v.1.4.1), minqa(v.1.2.4), htmltools(v.0.5.2), ggfun(v.0.0.8), later(v.1.3.0), tzdb(v.0.1.2), aplot(v.0.1.8), lubridate(v.1.7.10), DBI(v.1.1.1), dbplyr(v.2.1.1), MASS(v.7.3-54), boot(v.1.3-28), cli(v.3.4.1), parallel(v.4.0.3), pkgconfig(v.2.0.3), flashClust(v.1.01-2), rncl(v.0.8.4), foreign(v.0.8-81), plotly(v.4.9.4.1), xml2(v.1.3.2), vipor(v.0.4.5), bslib(v.0.3.0), stringdist(v.0.9.7), estimability(v.1.3), rvest(v.1.0.1), yulab.utils(v.0.0.5), janeaustenr(v.1.0.0), digest(v.0.6.27), httpcode(v.0.3.0), rmarkdown(v.2.11), cellranger(v.1.1.0), tidytree(v.0.4.1), gdtools(v.0.3.1), curl(v.4.3.2), shiny(v.1.6.0), nloptr(v.1.2.2.2), lifecycle(v.1.0.3), nlme(v.3.1-151), jsonlite(v.1.7.2), askpass(v.1.1), fansi(v.0.5.0), pillar(v.1.8.1), ggsci(v.2.9), lattice(v.0.20-41), fastmap(v.1.1.0), httr(v.1.4.2), survival(v.3.2-7), glue(v.1.6.2), zip(v.2.2.0), bit(v.4.0.4), stringi(v.1.7.4), sass(v.0.4.0), textshaping(v.0.3.6), gfonts(v.0.2.0), memoise(v.2.0.0) and mathjaxr(v.1.2-0)